Patterns in Program References

نویسندگان

  • Walter F. Freiberger
  • Ulf Grenander
  • Paul D. Sampson
چکیده

This paper describes a study of sane of the characteristics of program referencing patterns. Program behavior is investigated by constructing stochastic models for the page reference mechanism and evaluating the validity of the assumptions made through comparison with empirical results. The notion of a regime process is shown to play a useful role in describing the observed phenomena mathematically. The study falls within the realm of a rapidly growing field of computer science known as compumetrics, where quantitative and qualitative methods are being applied to the study and evaluation of computer performance. Introduction The execution of a program in a multiprogrammed system has to be interrupted frequently for reference to information stored in different levels of the storage hierarchy. To discuss the strategy of the basic decision algorithm appropriate for the system, it is necessary to know something about the manner in which these references to stored information are made. When dealing with a large program, such as an assembler or a compiler, it is impossible, in practice, to predict the references deterministically, and it has been recognized for a long time that one has to resort to probabilistic models in this context. The choice of the correct probabilistic model is far from obvious. It is the purpose of this paper to elucidate the problem by considering an analytic model and to relate the results of the analysis to actual measurements. Our aim consists in improving our understanding of the stochastic structure underlying the phenomena, but not yet to suggest methods for improvements in the decision algorithms of existing operating systems. The latter will be possible once the model has been firmly established and validated. The model proposed in this paper should be regarded as only a first approximation to the best one. We could no doubt have obtained higher accuracy by choosing the stochastic process appearing in the next section to be more general, but this seemed to us not to be called for at the present preliminary stage of our investigations. We start from some simple hypotheses about program references, in order to arrive at a model that is more intrinsic than pure curve-fitting would be: I . Under multiprogramming, each program is given a 230 “slice” of execution time that cannot be exceeded but FREIBERGER, GRENANDER AND SAMPSON may well not be used up. The size of the time slice can vary considerably among different systems, but will normally be enough for many thousands of operations. When we change from one program to another the probabilistic structure can also be expected to change, at least in an environment with a heterogeneous load. These breakpoints, and their distribution in time, will play an important role in any realistic model. Some other work [ 1-31 is in progress to develop statistical methods for studying this problem. 2. Between two breakpoints one can expect more homogeneous behavior, both because programs are executed sequentially, except for branching, and because the simplest kind of string information is also sequential. One might compare this with the notion of locality [4]. 3. Branching makes for less homogeneous behavior and causes one of the most obvious aspects of program behavior, viz., looping. This will lead to an (approximately) periodic appearance, and with loops within loops there can be many periods present. 4. On the other hand, most programs have a characteristic behavior at their beginning (and usually also at their end). Initialization will be needed to set up tables, specify parameters, deal with macros and subroutines, etc. This also leads to heterogeneous behavior. There are, of course, many other phenomena of program behavior that are known empirically although not yet quantified. For the moment, however, we shall limit ourselves to those mentioned above. To be more specific, we consider an operating system that employs the concept of paging from and to virtual IBM J. RES. DEVELOP. memory. Pages are assumed to be of constant size and the entire storage hierarchy (main memory and auxiliary storage) in which paging takes place is taken to contain the collection P = { 1, 2 , 3,. . ., M } of pages, where M is the total number of pages. Let m be the maximum number of pages that can reside in main memory and denote the set of these pages by { 1, 2 , 3, . . ., m } . During execution the operating system makes references to memory and we shall denote by p = (. . ., p-l, po, pl. . ..} the reference string of successive pages (not necessarily distinct) referred to. Under demand paging, a page that is referenced but is not resident in main memory will be brought in from auxiliary storage, usually in place of some other page which has to be pushed out, according to the particular paging algorithm implemented in the operating system. A common class of such algorithms is the so-called stack algorithm, of which the most popular is LRU (Least Recently Used); let us define stack algorithms as in [4]. The set D = { 1, 2 , . . ., J } denotes the set of pages of a given program, so that the members of the reference string p k E J . The program has been allocated a main memory space of m pages, where 1 5 m 5 J . We call a subset S of D such that S contains m or fewer pages a possible memory state. Let r be a reference string and A an allocation algorithm, and let S ( A , m, r ) denote the memory state after A has processed r pages under demand paging in an initially empty main memory of size m. Then A is a stack algorithm if S ( A , m 1, r ) is a subset of S ( A , m, r ) . That is, the contents of the ( m 1)-page memory are always contained in the m-page memory, so that the memory states are “stacked up” on one another. After considering this definition, it is clear why, for suah stack algorithms, performance is crucially dependent on the behavior of the distunce string, defined as follows. Suppose that at time t page r, = i has been referenced, and that the next reference to page i occurs at time t + n, + 1 (see below); that is, between these two references to page i there have beeh n, page references, but none to page i. Let d, denote the number of distinct page references among these nt references. Then the string d = (. . ., dl , do, dl, . . .) is called the distance string corresponding to the reference string r . nt page references, none of them = i I I I I ! rt = i YlCl kt+* rt+n, r,+nt+1 = i A page exception will occur each time that dt = m. Consider a substring (ra, ra+l, ro+s , . . ., rb) of the reference string and denote by N%b the number of pages that had to be brought in from auxiliary storage during the interval [a , b ] . The ratio R,, = N a b / (6 -a+ 1 ) is called the paging rate and is one of the criteria to be evaluated MAY 1975 when judging the performance of a paged computing system. Notice that R , , is an empirical quantity, not a parameter of any model. It should be mentioned, in passing, that paging rates can be measured in at least two ways: either with respect to the flow of instructions (the unit being a time interval between two successive instruction executions) or with respect to new page references. Because the first method is more informative when studying overhead caused by page faults, we adopt it in this paper. Model of program references Let us measure time, as indicated above, in terms of the number of instructions executed. We shall not be concerned with happenings during individual instruction executions, but rather with nonoverlapping and contiguous time intervals of length T , where T is the window of the working set: the working set W ( t ) is the binary “vector with its ith component equal to one if the ith page has, zero if it has not, been referenced in the tth window [ ( t 1 ) T , t T ] . In other words, W ( t ) can be looked upon either as a binary vector or as the set whose indicator function is this distance. The number of elements in W ( t ) is M # [ W ( t ) ] = ei ( t ) = IIW ( t ) 1 1 (the Hamming norm), where e , ( t ) is the indicator function of page i. Let us make the time parameter t continuous and treat W ( t ) in terms of the “birth and death process” e i ( t ) (considering one i only) : k 1 mortality p per time unit [ bi {birthrate hapncdr time unit e ( t ) = or with Here the time unit will be equal to the window size and the birthrate is equivalent to the reentry rate of pages. We have the conditional probabilities PI([) = P[e ( t ) = lle(0) = I ] , and P o ( [ ) = P [ e ( t ) = 1 le (0) = 01. Omitting subscripts since both P obey the same differential question but with different boundary conditions, we have (see [ 5 ] , p.’459) ‘ p ( t + h ) = P ( t ) ( l p h ) + [ l P ( t ) ] A h + o ( h ) , [ P ( t + h ) P ( t ) ] / h = A P ( t ) ( p + A ) + o ( l ) , ( d P / d t ) + ( A + p ) P = A, P(t) = c exp[-(A + p ) t ] + h / ( A + p ) . We must have that Po(t) + 0 as t 0 and P,( t ) + 1 as t -+ 0; therefore 231 PATTERNS IN PROGRAM REFERENCES and we get the equilibrium probability ( t -+ m) P [ e ( t ) = I ] = A / ( A + p ) = p , say, which holds of course for each page: The equilibrium probability ( t -+ m ) for the size of the working set can be denoted by P [ w ( t ) = k ] = wk, say; the whole probability distribution for the working set is denoted by {w}. Let b ( a ) denote the Bernoulli distribution with parameter a ; then {w} will be the convolution of M distributions with a! = Ai / ( A i + pi) : {wl = n * b[A,/ ( A i + pil l M i= 1 The generating function of n ( 1 pi + p,z) = exp M

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Citation Patterns in Dissertations of Medical Students in Rafsanjan University of Medical Sciences ( RUMS ) During Three Time Periods Between Years 1372-1386

Introduction: Citation analysis has a significant role in research projects. This study aimed at analyzing citations in dissertations of Rafsanjan medical university during three five-year periods. Methods: In this cross- sectional study, 519 dissertations of Rafsanjan medical students presented during years 1372 until 1386 were analyzed through a checklist. Data referring to three five- yea...

متن کامل

A Static Bug Detector for Uninitialized Field References in Java Programs

SUMMARY Correctness of Java programs is important because they are executed in distributed computing environments. The object initial-ization scheme in the Java programming language is complicated, and this complexity may lead to undesirable semantic bugs. Various tools have been developed for detecting program patterns that might cause errors during program execution. However, current tools ca...

متن کامل

Assessment and Comparing of Hospital Performance Using “Accreditation Pattern”, “Organizational Excellence Pattern” and Program Chain Patterns

Introduction: Hospital performance measurement is an essential part for providing feedback on the efficacy and effectiveness of services. The purpose of this study was assessment and comparing of hospital performance using “Accreditation Pattern”, “Organizational Excellence Pattern “and Program Chain (IPOCC) Patterns.  Methods: This descriptive-comparative study was conducted in 2019 in the ed...

متن کامل

Program Counter Based Pattern Classification in Pattern Based Buffer Caching

One of the most important problems in improving file system performance is to design effective block replacement schemes for the buffer cache. Recently, replacement schemes making use of regularities of references such as sequential and looping references were proposed and shown to be more effective than purely recency or frequency based schemes such as LRU. However, these schemes classify acce...

متن کامل

Adaptive Just-in-time Value Class Optimization for Lowering Memory Consumption and Improving Execution Time Performance

The performance of value classes is highly dependent on how they are represented in the virtual machine. Value class instances are immutable, have no identity, and can only refer to other value objects or primitive values and since they should be very lightweight and fast, it is important to optimize them carefully. In this paper we present a technique to detect and compress common patterns of ...

متن کامل

National Program for the Rehabilitation of Socially Harmed Women in the Iranian Welfare Organization: Goals, Service Patterns, Challenges and the Future Path

Background and Aim: Prostitution is one of the alarming social harms in any population with considerable challenges and concerns. Many prostitutes have been victims of unfavorable social conditions. The purpose of this study was to determine and explain the goals, service patterns, challenges and proposed solutions for the Program of Rehabilitation of Socially Harmed Women in the Iranian Welfar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IBM Journal of Research and Development

دوره 19  شماره 

صفحات  -

تاریخ انتشار 1975